Building of Networks of Natural Hierarchies of Terms Based on Analysis of Texts Corpora
نویسنده
چکیده
The method of building a network of natural terms hierarchy is proposed which may be regarded as "quasiontology", i.e. the basis for corresponding terminological ontology formation. Natural terms hierarchy network of is based on «significantly informative» text elements, the reference words and phrases. The methodology to identify such terms is given in [1, 2]. The use of such elements can form search images and cover the whole knowledge bases for the further common ontology construction. Reference words and phrases for natural terms hierarchy construction are selected with taking into account the discriminant power. However, one of the properties is not sufficient for the construction of thesauruses and ontologies. Sometimes words with low discriminant power, in particular, the most frequent words of the given subject area (e.g., "Information", "Retrieval", "Search" words in the information retrieval body) are essential for a task that is considered.
منابع مشابه
Building domain specific lexical hierarchies from corpora
In this article, we present a new algorithm for building domain specific lexical hierarchies from texts. The basic elements of such a hierarchy are the normalized terms – mono and multi-word terms – extracted from a large corpus by a terminological extractor. The algorithm relies on collocations for representing the meaning of these terms, finding hierarchical relations between them and finally...
متن کاملComparative Study of the Academic Vocabulary Content of Electronic Engi-neering Corpora, GE Materials and M.S. Entrance Examinations
The importance of vocabulary learning has been underlined in the field of English for Academic Purposes (EAP) because non-English majors who require reading English texts in their fields of study have to expand their English vocabulary knowledge much more efficiently than ordinary ESL/EFL learners. Since academic vocabulary instruction in Iranian universities is realized through the use of Gene...
متن کاملSyntactic Complexity of Russian Unified State Exam Texts in English: A Study on Reliability and Validity
In this study we analyze texts used in Russian Unified State Exam on English language. Texts that formed small research corpora were retrieved from 2 resources: official USE database as a reference point, and popular website used by pupils for USE training “Neznaika” (https://neznaika.pro/). The size of two corpora is balanced: USE has 11934 tokens and “Neznaika” - 11918 tokens. We share Biber’...
متن کاملOntologies, Taxonomies, Thesauri: Learning from Texts
The use of ontologies as representations of knowledge is widespread but their construction, until recently, has been entirely manual. We argue in this paper for the use of text corpora and automated natural language processing methods for the construction of ontologies. We delineate the challenges and present criteria for the selection of appropriate methods. We distinguish three major steps in...
متن کاملThe Genre of Landscape and Building-Painting (The Artistic) and the Discourse of Nationalism (The Political), during the Late Qajar and Early Pahlavi Periods an Analysis Based on “Mediation” and “Totality” in Methodology of Georg Lukács
Landscape and building-painting as a genre is one of the many features of Iranian painting in the last years of the Qajar and the first years of the Pahlavi era. This paper explains the relation between these painting as the particular, following the domination of the discourse of nationalism as the general. To reason this idea and to explain the relationship between these two, Georg Lukács the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1405.6068 شماره
صفحات -
تاریخ انتشار 2014